Intro

Movies have been an important aspect of society since their creation. They portray issues that reflect our culture and current events, while also providing entertainment. As a society, we have been witnessing the severe impact of the COVID-19 pandemic on the movie industry, from movie theater closures to film releases being delayed or pushed straight to streaming services. This project seeks to investigate the impact that global events - including the COVID-19 pandemic - have had on a variety of factors within the movie industry. The data from this project is mainly taken from the IMDb Movies Extensive Database, and supplemented with the IMDB 5000 Movie Dataset. Both of these datasets were populated with data from IMDb. We also used data scraped from google trends about interest in 5 pandemic-related movies, which is shown in the line chart below.

Summary paragraph (Ha)

The midpoint_df.csv is a data set we compiled from the IMDB 5000 Movie Dataset and the IMDB Movies Extensive Database for the purpose of calculation and data visualization in this deliverable. It contains data of 1637 movies produced in the US or in conjunction with the US from 1990 to 2020. In details, the data set considers 12 variables, namely Plot Keywords, Genre, Usa Gross Income, Budget, Language, Country, Avg Vote, Reviews From Critics, Date Published, Year, Title And Year.

Limitations

While we tried to be as accurate as possible, there are some limitations to the analysis we performed on our data. The first is that we disregarded inflation of currency, as it only increased 5.4% from 1990 to 2020, which is less than $2000 difference. Additionally, movie production takes a few years to occur, so themes prevalent in global events are not instantaneously reflected in movies that are released during that time period. Furthermore, while we made sure each of our observations had a reported USA gross income, some did not report a budget. For the scatterplot visualization specifically, the datapoints are somewhat crowded and some of the colors used to identify the movie genre are quite similar and difficult to distinguish. However, color palettes did not contain enough different colors for all of the movie genres, and because the scatterplot is interactive, hovering over datapoints will allow for clearer views of the data they represent. For the bar chart, one of the limitations is that we calculated the average revenue by year, rather than the median. This could potentially allow values to be skewed by outliers of movies that earned extremely high or extremely low amounts of revenue.

Summary table (Emiri)

About Table: A lot of our questions revolve around different statistics on movies across different years. Thus, our summary table looks at the number of movies published, average budget, USA gross income and most and least common genre from 1990 to 2020.
Insight: By taking our large data set and summarizing it, we are able to see more specific, cleaned up information. For example, this process revealed that our data pool only had one movie for 1992, which did not report a budget. The table displays trends throughout years as well. From 1993 to 2015, Drama was the most common genre of released movies. There is more variety in the least common genre across the years, however, History seems to come up often. Other than 1992 and 2018, the average budget of movies is in the ten millions. Again, 1992 had only one movie reported with no budget. 2018 has a budget that is significantly less than surrounding years. We can also see that 2020 had a significant decrease in average USA income. Considering the COVID-19 pandemic, global events may have an impact on the movie industry. Another interesting insight about 2020 are the most common and least common genres, Horror and Adventure respectively. Looking at other key events, in 2008, the number of movies, average budget and average USA income do not significantly change. Drama is the most common genre and History is the least common genre. In 2005 when Youtube was introduced, the number of movies increases, hitting 90 for the first time. The most common genre and least common genre do not stand out, being Drama and War respectively. Similar to 2008, 2001, when 9/11 happened, 1997, when Netflix was created, and 1991, the introduction of the World Wide Web, there is no summary value that stands out significantly.

Year Number of Movies Published Average Budget Average USA Gross Income Most Common Genre Least Common Genre
1990 6 14205000 55549289 Comedy Action
1991 5 15694600 17693069 Drama Action
1992 1 NaN 553171 Action Action
1993 9 29187500 82194530 Drama Animation
1994 18 35307692 81764922 Drama Animation
1995 20 39078947 50213473 Drama Animation
1996 30 38945385 53125653 Drama Family
1997 37 44958065 46053871 Drama Family
1998 50 28253488 36550476 Drama Fantasy
1999 41 35045588 30532869 Drama Biography
2000 68 32853279 29899064 Drama History
2001 68 34096143 40879155 Drama Musical
2002 86 33328267 41113047 Drama History
2003 79 27669486 29988844 Drama Animation
2004 65 38646542 47247506 Drama Musical
2005 98 40062415 33537240 Drama War
2006 96 34468125 25528268 Drama Music
2007 85 37419787 42569300 Drama Musical
2008 78 33990769 40956521 Drama History
2009 81 40627029 46274694 Drama History
2010 105 36840266 44933600 Drama History
2011 102 33237211 30267374 Drama History
2012 92 42853995 53299935 Drama History
2013 75 55074094 52736999 Drama Sport
2014 89 38249200 41272191 Drama War
2015 72 44742656 70895405 Drama Fantasy
2016 55 60447647 56733594 Action Family
2017 7 32000000 12497767 Drama Action
2018 5 7600000 95947786 Crime Family
2019 8 47428571 58982142 Action Biography
2020 6 19250000 9301084 Horror Adventure

Scatterplot (Melina)

About Scatterplot: One of the questions we wanted to consider was the popularity of different movie genres over the years, and the below scatterplot answers that question by showing the relationship between the average IMDB Metascore vote (out of 10) and each movie genre in every year from 1990 to 2020. Since most movies in our dataset had multiple genres, the avg_vote for each movie was applied to each of the movies’ genres. Please note that the scatterplot is interactive, so you can hover over each of the data points to see the year, average vote, and genre.
Insight: From the scatterplot, we can see that in 2020 (event: COVID-19 pandemic), the highest rated genre thus far is Thriller and the lowest rated genre thus far is Mystery. In 2008 (event: Great Recession), the highest rated genre was Biography and the lowest rated genre was Fantasy. In 2005 (event: introduction of YouTube), the highest rated genre was History and the lowest rated genre was Musical. In 2001 (event: 9/11 bombing), the highest rated genre was Musical and the lowest rated genre was Sport. In 1997 (event: introduction of Netflix), the highest rated genre was Mystery and the lowest rated genre was Fantasy. And in 1991 (event: introduction of the Internet/World Wide Web), the highest rated genre was Biography and the lowest rated genre was Crime.

Bar chart (Roshni)

About Chart: One of the questions we wanted to answer for our data was how different national events impacted the movie industry. To answer this, this graph looks at the average revenue brought in by the movie industry by year.
Insight: The main years we decided to look at were 1991 (Introduction of the Internet), 1997 (Introduction of Netflix), 2001 (9/11), 2005 (Introduction of Youtube), 2008-2009 (Great Recession), and 2020 (Coronavirus Pandemic). All of the years we looked at, (with the exception of 2001) showed a decrease in earnings from the year prior. Potential explanations for these drops could be competition in forms of entertainment (The internet, netflix and youtube), a decrease in disposable income (recession), or the closue of movie theaters due to fear of spreading covid-19 (coronavirus pandemic)

Line chart (Gisele)

About Chart: This line graph depicts the search results of 5 pandemic related movies relative to the highest point on the chart for their given time frame. I chose this chart because it clearly shows the popularity of these movies over time (from one year after their release date to November 2020) and helps give a sense of how quarantine brought these movies back into the spotlight.
Insight: From this graph we can clearly see a boost in attention towards movies that deal with a pandemic in March 2020, even if they are more fiction than what could happen in reality (see I am Legend and 28 days later that are notably zombie movies). Every movie’s search results more than double in March of 2020 relative to November 2019 with Contagion coming in as the most different with a 9900% increase in that time frame. With respects to how this chart displays data it’s clear to see that the month that these movies have been searched the most, after the year following it’s release date, has been the month that the United States started implementing quarantine orders to get people to stay inside. We can assume from this that the real life events of a global pandemic made people more interested in movies involving the same topic.